Fix QNN runner KV cache bitwidth detection in Android JNI by abhinaykukkadapu · Pull Request #18731 · pytorch/executorch

abhinaykukkadapu · 2026-04-07T00:30:46Z

Summary

The QNN runner in the Android JNI layer was hardcoded to Runner<uint16_t>, but models can use either 8-bit or 16-bit KV caches
This mismatch caused gibberish output in the Android demo app while the CLI runner worked correctly
Now dynamically queries get_kv_io_bit_width from the model (mirroring qnn_llama_runner.cpp) and instantiates the correct Runner<uint8_t> or Runner<uint16_t>
Also passes temperature_ to the Runner constructor (was previously omitted)

Test plan

Built Android AAR with QNN support (SDK 2.37) — jni_layer_llama.cpp compiles cleanly with both template instantiations
Gradle unit tests pass (testDebugUnitTest)
On-device test with QNN model (in progress)

Summary: The QNN runner in the Android JNI layer was hardcoded to use Runner<uint16_t>, but models can be exported with either 8-bit or 16-bit KV caches. This mismatch caused the KV cache data to be misinterpreted, resulting in gibberish output in the Android demo app while the same model worked correctly via the CLI runner. This change mirrors the dynamic KV bitwidth detection already present in qnn_llama_runner.cpp by querying the model's get_kv_io_bit_width method and instantiating the correct Runner<uint8_t> or Runner<uint16_t> accordingly. Also passes temperature_ to the Runner constructor which was previously omitted. Fixes #18571 Closes #17622 Test Plan: - Built Android AAR with QNN support (SDK 2.37) — jni_layer_llama.cpp compiles cleanly with both Runner<uint8_t> and Runner<uint16_t> template instantiations - Unit tests pass (gradlew testDebugUnitTest)

pytorch-bot · 2026-04-07T00:30:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18731

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-07T00:31:40Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

abhinaykukkadapu requested a review from kirklandsign as a code owner April 7, 2026 00:30

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 7, 2026

abhinaykukkadapu closed this Apr 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix QNN runner KV cache bitwidth detection in Android JNI#18731

Fix QNN runner KV cache bitwidth detection in Android JNI#18731
abhinaykukkadapu wants to merge 1 commit intomainfrom
abhinayk/fix-qnn-kv-bitwidth-android-jni

abhinaykukkadapu commented Apr 7, 2026

Uh oh!

pytorch-bot Bot commented Apr 7, 2026

Uh oh!

github-actions Bot commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abhinaykukkadapu commented Apr 7, 2026

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Apr 7, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18731

Uh oh!

github-actions Bot commented Apr 7, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

This PR needs a `release notes:` label